-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Re-enable use_doc_values_skipper and add specialized lucene query for @timestamp field filtering.
#127260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Re-enable use_doc_values_skipper and add specialized lucene query for @timestamp field filtering.
#127260
Conversation
d5bd0d1 to
21075f1
Compare
506cbfc to
d2cf578
Compare
|
Elastic/logs run comparing main (baseline) to this branch (contender): The |
|
Looks the large regression are not caused by range queries on timestamp field ( A lot of time is spent at I suspect some specialization needs to be build for |
Additionally, updated the filter-by-filter optimization to make use of the new TimestampQuery.
| return doc = target; | ||
| case MAYBE: | ||
| if (target > innerApproximation.docID()) { | ||
| target = innerApproximation.advance(target); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect to advance here the timestamps iterator instead of the innerApproximation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now that it is equivalent. Would it make sense to make innerApproximation a NumericDocValues and remove the reference from the parent class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done: 24819c4
| case YES -> true; | ||
| case MAYBE -> { | ||
| final long value = timestamps.longValue(); | ||
| yield value >= minTimestamp && value <= maxTimestamp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if value < minTimestamp, we know we will not be matching any more documents, maybe we can move approximation.match to NO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this is only true if we are always in the same primary sort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should have two MAYBE options, one that we know we are always in the same primary sort and another one with mixed primary sort. In that case we can apply the optimization in the first case (which should be the common case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, let me take a look at this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done: 5739565
|
fwiw: apache/lucene#14672 |
remove timestamp field from TimestampTwoPhaseIterator
c4dcd96 to
7ca9a34
Compare
Relates to #127263